AITopics | Nye County

Collaborating Authors

Nye County

WTU-EVAL: A Whether-or-Not Tool Usage Evaluation Benchmark for Large Language Models

Ning, Kangyun, Su, Yisong, Lv, Xueqiang, Zhang, Yuanzhe, Liu, Jian, Liu, Kang, Xu, Jinan

arXiv.org Artificial IntelligenceJul-2-2024

Although Large Language Models (LLMs) excel in NLP tasks, they still need external tools to extend their ability. Current research on tool learning with LLMs often assumes mandatory tool use, which does not always align with real-world situations, where the necessity for tools is uncertain, and incorrect or unnecessary use of tools can damage the general abilities of LLMs. Therefore, we propose to explore whether LLMs can discern their ability boundaries and use tools flexibly. We then introduce the Whether-or-not tool usage Evaluation benchmark (WTU-Eval) to assess LLMs with eleven datasets, where six of them are tool-usage datasets, and five are general datasets. LLMs are prompted to use tools according to their needs. The results of eight LLMs on WTU-Eval reveal that LLMs frequently struggle to determine tool use in general datasets, and LLMs' performance in tool-usage datasets improves when their ability is similar to ChatGPT. In both datasets, incorrect tool usage significantly impairs LLMs' performance. To mitigate this, we also develop the finetuning dataset to enhance tool decision-making. Fine-tuning Llama2-7B results in a 14\% average performance improvement and a 16.8\% decrease in incorrect tool usage. We will release the WTU-Eval benchmark.

action input, dataset, llm, (13 more...)

arXiv.org Artificial Intelligence

2407.12823

Country:

Europe > Greece (0.15)
North America > United States > Nevada > Washoe County > Reno (0.14)
Asia > China > Beijing > Beijing (0.04)
(10 more...)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Transportation (0.70)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What Happened When ChatGPT Got Hold of My Online Dating Profile - CNET

#artificialintelligenceMar-19-2023, 22:45:17 GMT

For the record, I don't own socks with sloths on them. I have three pairs with the CNET logo on them. ChatGPT thinks I might, though, and it also thinks this fact could get me matches on Hinge, or Bumble, or any dating app that has the audacity to ask me for a random fact about myself. Click to read more Love Syncs. Here's a random fact about me: When I tested how ChatGPT might handle rewriting my dating app profile, the experimental AI chatbot tried to turn me into a cringey manic pixie dream girl who forgets to water her "jungle" of houseplants, dances to her favorite "tunes" and is looking for "a fellow weirdo" to go on *shudders* "adventures" with.

chatgpt, online dating profile, sloth, (13 more...)

#artificialintelligence

Country:

North America > United States > Nevada > Nye County (0.05)
Europe > Italy > Tuscany (0.05)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Look What ChatGPT Did to My Online Dating Profile - CNET

#artificialintelligenceFeb-10-2023, 21:02:17 GMT

For the record, I don't own any socks with sloths on them. I have three pairs with the CNET logo on them. ChatGPT thinks I might, though, and it also thinks this fact could get me matches on Hinge, or Bumble, or any dating app that has the audacity to ask me for a random fact about myself. Click to read more Love Syncs. Here's a random fact about me: When I tested how ChatGPT might handle rewriting my dating app profile, the experimental AI chatbot tried to turn me into a cringey manic pixie dream girl who forgets to water her "jungle" of houseplants, dances to her favorite "tunes" and is looking for "a fellow weirdo" to go on *shudders* "adventures" with.

chatgpt, online dating profile, sloth, (13 more...)

#artificialintelligence

Country:

North America > United States > Nevada > Nye County (0.05)
Europe > Italy > Tuscany (0.05)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Will It Blend? Mixing Training Paradigms & Prompting for Argument Quality Prediction

van der Meer, Michiel, Reuver, Myrthe, Khurana, Urja, Krause, Lea, Santamaría, Selene Báez

arXiv.org Artificial IntelligenceOct-5-2022

This paper describes our contributions to the Shared Task of the 9th Workshop on Argument Mining (2022). Our approach uses Large Language Models for the task of Argument Quality Prediction. We perform prompt engineering using GPT-3, and also investigate the training paradigms multi-task learning, contrastive learning, and intermediate-task training. We find that a mixed prediction setup outperforms single models. Prompting GPT-3 works best for predicting argument validity, and argument novelty is best estimated by a model trained using all three training paradigms.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.08966

Country:

Asia > Middle East > Iraq (0.06)
Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > Nevada > Nye County (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

A Subspace-based Approach for Dimensionality Reduction and Important Variable Selection

Bo, Di, Hwangbo, Hoon, Sharma, Vinit, Arndt, Corey, TerMaath, Stephanie C.

arXiv.org Machine LearningJun-3-2021

An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.

critical subspace, selection, subspace, (12 more...)

arXiv.org Machine Learning

2106.01584

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Nevada > Nye County (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Lee, Kimin, Laskin, Michael, Srinivas, Aravind, Abbeel, Pieter

arXiv.org Artificial IntelligenceJul-21-2020

Model-free deep reinforcement learning (RL) has been successful in a range of challenging domains. However, there are some remaining issues, such as stabilizing the optimization of nonlinear function approximators, preventing error propagation due to the Bellman backup in Q-learning, and efficient exploration. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates three key ingredients: (a) bootstrap with random initialization which improves the stability of the learning process by training a diverse ensemble of agents, (b) weighted Bellman backups, which prevent error propagation in Q-learning by reweighing sample transitions based on uncertainty estimates from the ensembles, and (c) an inference method that selects actions using highest upper-confidence bounds for efficient exploration. Our experiments show that SUNRISE significantly improves the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments. Our training code is available at https://github.com/pokaxpoka/sunrise.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2007.04938

Country:

North America > United States > Nevada > Nye County (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Energy (0.93)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Generalized Earthquake Frequency-Magnitude Distribution Described by Asymmetric Laplace Mixture Modelling

Mignan, Arnaud

arXiv.org Machine LearningOct-17-2018

The complete part of the earthquake frequency-magnitude distribution (FMD), above completeness magnitude mc, is well described by the Gutenberg-Richter law. The parameter mc however varies in space due to the seismic network configuration, yielding a convoluted FMD shape below max(mc). This paper investigates the shape of the generalized FMD (GFMD), which may be described as a mixture of elemental FMDs (eFMDs) defined as asymmetric Laplace distributions of mode mc [Mignan, 2012, https://doi.org/10.1029/2012JB009347]. An asymmetric Laplace mixture model (GFMD- ALMM) is thus proposed with its parameters (detection parameter kappa, Gutenberg-Richter beta-value, mc distribution, as well as number K and weight w of eFMD components) estimated using a semi-supervised hard expectation maximization approach including BIC penalties for model complexity. The performance of the proposed method is analysed, with encouraging results obtained: kappa, beta, and the mc distribution range are retrieved for different GFMD shapes in simulations, as well as in regional catalogues (southern and northern California, Nevada, Taiwan, France), in a global catalogue, and in an aftershock sequence (Christchurch, New Zealand). We find max(mc) to be conservative compared to other methods, kappa = k/log(10) = 3 in most catalogues (compared to beta = b/log(10) = 1), but also that biases in kappa and beta may occur when rounding errors are present below completeness. The GFMD-ALMM, by modelling different FMD shapes in an autonomous manner, opens the door to new statistical analyses in the realm of incomplete seismicity data, which could in theory improve earthquake forecasting by considering c. ten times more events.

catalogue, fmd, mignan, (14 more...)

arXiv.org Machine Learning

1810.0745

Country:

North America > United States > California (0.75)
Europe > France (0.25)
Asia > Taiwan (0.25)
(9 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback